Search CORE

14 research outputs found

K-Theory Of Root Stacks And Its Application To Equivariant K-Theory

Author: Kobyzev Ivan
Publication venue: Scholarship@Western
Publication date: 04/08/2016
Field of study

We give a definition of a root stack and describe its most basic properties. Then we recall the necessary background (Abhyankar’s lemma, Chevalley-Shephard-Todd theorem, Luna’s etale slice theorem) and prove that under some conditions a quotient stack is a root stack. Then we compute G-theory and K-theory of a root stack. These results are used to formulate the theorem on equivariant algebraic K-theory of schemes

Scholarship@Western

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

Author: Ghodsi Ali
Kobyzev Ivan
Rezagholizadeh Mehdi
Valipour Mojtaba
Publication venue
Publication date: 19/04/2023
Field of study

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.Comment: Accepted to EACL 202

arXiv.org e-Print Archive

Attribute Controlled Dialogue Prompting

Author: Kobyzev Ivan
Liu Runcheng
Poupart Pascal
Rashid Ahmad
Rezagholizadeh Mehdi
Publication venue
Publication date: 11/07/2023
Field of study

Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.Comment: Accepted at ACL 2023 In Finding

arXiv.org e-Print Archive

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Author: Ghodsi Ali
Jafari Aref
Kobyzev Ivan
Poupart Pascal
Rezagholizadeh Mehdi
Publication venue
Publication date: 12/12/2022
Field of study

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).Comment: Published at EMNLP 2022 (Findings

arXiv.org e-Print Archive

Mathematical Challenges in Deep Learning

Author: Asgharian Masoud
Chen Boxing
Hemati Sobhan
Kobyzev Ivan
Kong Linglong
Li Xinlin
Liu Wulong
Metel Michael R.
Nia Vahid Partovi
Sun Ke
Zhang Guojun
Publication venue
Publication date: 24/03/2023
Field of study

Deep models are dominating the artificial intelligence (AI) industry since the ImageNet challenge in 2012. The size of deep models is increasing ever since, which brings new challenges to this field with applications in cell phones, personal computers, autonomous cars, and wireless base stations. Here we list a set of problems, ranging from training, inference, generalization bound, and optimization with some formalism to communicate these challenges with mathematicians, statisticians, and theoretical computer scientists. This is a subjective view of the research questions in deep learning that benefits the tech industry in long run

arXiv.org e-Print Archive